Textual data comes into various sources
- Blogs, forums, reviews
- News websites, digitalized newspapers
- Public statements, reports, interviews, transcripts
=> Let’s see an example using textual data to monitor the state of the economy
2022-06-04
Textual data comes into various sources
=> Let’s see an example using textual data to monitor the state of the economy
Question: Does textual data improve the assessment of how likely is it for a rate increase in the next months? What about non-standard monetary policies?
=> Texts contain multiples themes that display different position/sentiment
How to extract this information? How to recover the author’s sentiment on a specific idea?
We need a tools that leverages on both:
=> Today’s focus is on the creation of topic-dependent time series
It is relatively easy to retrieve the full text of ECB press conferences since the creation of the ECB. There is about 260 press conferences between 1998 and 2022.
For example, the Loughran-McDonald lexicon for the analysis of financial textual data:
## List of 2 ## $ negative: chr [1:2355] "abandon" "abandoned" "abandoning" "abandonment" ... ## $ positive: chr [1:354] "able" "abundance" "abundant" "acclaimed" ...
A wordcloud of the most frequent positive and negative words found in the press conferences corpus.
By fitting a topic model on the ECB press conferences, each paragraph is assigned with a vector or proportions, indicating the predominant topics in that paragraph.
This can also be used to identify the paragraphs the most relevant to a topic:
## prob paragraph ## Inflation policy 0.978 "To sum up, based on its regular economic and monetary analyses, t" ## Monetary policy analysis and strategy 0.949 "In the context of the first pillar of our monetary policy strateg" ## Lending analysis and operations 0.958 "There has been little change in credit growth, which remained wea" ## Eurozone and reforms 0.949 "The Governing Council recognises the key role of the Next Generat" ## Economic growth 0.973 "Let me now explain our assessment in greater detail, starting wit" ## Uncertainty 0.932 "In the Governing Council’s assessment, the risks to the economic "
I focus on a method incorporating the independent results of sentiment analysis and topic modeling.
The quantity of interest is the topical sentiment attention \(c_{k,t}\) at any point in time. This represents the sentiment effectively conveyed by topic \(k\) in period \(t\).
The quantity \(c_{k,t}\) is distinct from the sentiment \(s_t\) measured at the same period, in the sense that it only accounts for the expressed sentiment relevant to topic \(k\).
My hope is that the quantity \(c_{k,t}\) proves to be more useful for economic applications than the sentiment \(s_t\).
At the document level (document \(i\) of period \(t\)):
Topic proportions can be used to select relevant documents. It is possible to use \(\theta_{t,i,k}\) directly in the time-series aggregation.
We assume that the measured sentiment \(s_{t,i}\) comes from the the sentiment conveyed by each topic, defined by the product of the topic-specific sentiment and the topical proportion:
\[ \underbrace{s_{t,i}}_{\substack{\text{sentiment}\\\text{analysis}}} = \sum^K_{k=1} \overbrace{s_{t,i,k}}^{??} \times \underbrace{\theta_{t,i,k}}_{\substack{\text{topic}\\\text{modeling}}} \]
As we only know the estimated \(s_{t,i}\) from sentiment analysis and the topic proportions \(\theta_{t,i,k}\) from topic modeling independently, there is not a single solution for the values of \(s_{t,i,k}\).
To solve this problem, we assume that the topic-specific sentiment \(s_{t,i,k}\) is constant across topic within that text and equal to the text’s sentiment \(s_{t,i}\). Formally, \(s_{t,i,k} = s_{t,i}, \forall k \in K\).
Formally, \(s_{t,i,k} = s_{t,i}, \forall k \in K\).
This assumption becomes plausible (resp. implausible) when we have very short (resp. long) texts.
On what dimension should topic modeling be applied?
In consequence, we compute the intermediate quantity \(c_{t,i,k} = s_{t,i,k} \times \theta_{t,i,k}\), that we define as topical sentiment attention. This quantity represents the effective sentiment conveyed by a topic in a given document.
This quantity might also be aggregated into time series for a sampling period \(t\) by taking the mean across documents:
\[ c_{t,k} = \frac{\sum_{i = 1}^{N_t} c_{t,i,k}}{N_t} \]
and it follows that \(s_t = \sum_{i=1}^K c_{t,k}\). In other words, this operation breaks the measured sentiment at a point in time into topic-specific sentiment quantities.
Question: Does textual data improve the assessment of how likely is it for a rate increase in the next months? What about non-standard monetary policies?
The Picault & Renault (2017) lexicon has been specifically developed to analyze the sentiment conveyed by the ECB press conferences. It identifies n-grams, sequences of words, and classify them among 6 categories.
Applying the Picault & Renault lexicon gives two sentiment values for each paragraph:
## date paragraph MP EC ## <char> <char> <char> <char> ## 1: 1998-06-09 (ii) it provisionally agreed on a budget for the E -1.695 0.8 ## 2: 1998-06-09 (iii) it agreed on the framework for the organisat 0 0 ## 3: 1998-06-09 Furthermore, the Governing Council decided on two -2.633 2 ## 4: 1998-06-09 (ii) with respect to the euro banknotes, a largely -3.39 1.6
Time series are formed by averaging the measured sentiment across paragraphs, yielding MP (Monetary Policy) and EC (Economic Condition) sentiment measures for each press conference date.
At the document level (document \(i\) of period \(t\)):
Topic proportions can be used to select relevant documents. It is possible to use \(\theta_{t,i,k}\) directly in the time-series aggregation.
Black line is the EC sentiment. The filled area is the contribution of each topic to the EC sentiment.
Defining ECB’s decision :
How to model ECB’s decision? Following Picault & Renault (2017), using a forward looking Taylor monetary policy rule augmented with sentiment. Since \(\text{Decision}_t\) is a discrete outcome, we use an ordinal probit model with the latent variable defined as:
\[ \text{Decision}_t^* = \alpha + \beta_1 s^{\text{EC}}_t + \beta_2 s^{\text{MP}}_t + X_t^{T}\beta + \epsilon_t, \]with \(X_t\) a set of macroeconomic variables:
To test the significance of the topic-specific sentiment, we compare two models:
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | |
|---|---|---|---|---|---|---|
| 12-month inflation expectation | 2.870*** (0.671) | 2.383*** (0.707) | 2.480*** (0.723) | 2.595*** (0.660) | 2.092** (0.695) | 2.237** (0.711) |
| EC Sentiment | 1.873*** (0.431) | 3.224*** (0.658) | 1.806*** (0.432) | 3.191*** (0.683) | ||
| MP Sentiment | 0.301 (0.216) | 0.313 (0.220) | 0.157 (0.214) | 0.168 (0.218) | ||
| (EC Sentiment | Monetary policy analysis and strategy Topic) - (EC Sentiment) | 2.551** (0.900) | 2.486** (0.918) | ||||
| \(N\) | 238 | 238 | 238 | 237 | 237 | 237 |
| \(Pseudo-R^2\) | 0.119 | 0.206 | 0.235 | 0.117 | 0.191 | 0.218 |
As in Picault & Renault (2017), the Economic Condition is highly significant. Unlike them, however, the Monetary Policy sentiment is not significant.
The addition of topical information to the sentiment is significant. This is an encouraging result for this methodology.
sentopics package